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IN THE CLAIMS: 

Please revise the claims as follows: 

1 . (Currently amended) A method of converting a document corpus containing an ordered 
plurality of documents into a compact representation in memory of occurrence data, said 
method comprising: 

developing a first vector for said entire document corpus, said first vector being a 
listing of integers corresponding to terms in said documents such that each said document in 
said document corpus is sequentially represented in said listing; 

developing a second vector for said entire document corpus, said second vector 
indicating the location of each said document's representation in said first vector; and 

developing a third uninterrupted listing for said entire document corpus, said third 
uninterrupted listing containing a sequential listing of floating point multipliers, each said 
floating point multiplier representing a document normalization factor for a corresponding 
document in said document corpus . 

2. (Canceled) 

3. (Currently amended) The method of claim 4*1, further comprising: 

rearranging, in said first vector, an order of said unique integers within the data for 
each said document so that all identical unique integers are adjacent. 
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4. (Currently amended) The method of claim 31, wherein said normalization factor is 
calculated as: 

NF - 1/ (Z x, 2 ) 172 , where x,- is the number of occurrences of a specific term in said 
document, so that NF represents the reciprocal of the square root of the sum of squares of all 
term occurrences in said document. 

5. (Previously presented) A method of converting, organizing, and representing in a 
computer memory a document corpus containing an ordered plurality of documents, said 
method comprising: 

for said document corpus, taking in sequence each said ordered document and 
developing a first uninterrupted listing of integers to correspond to an occurrence of terms in 
the document corpus. 

6. (Currently amended) The method of claim 49 2L further comprising: 

developing a third uninterrupted listing for said entire document corpus, said third 
uninterrupted listing containing a sequential listing of floating point multipliers, each said 
floating point multiplier representing a document normalization factor for a corresponding 
document in said document corpus, 

7. (Currently amended) The method of claim 49 5, further comprising: 

for each said document in said document corpus, rearranging said unique integers so 
that any identical integers are adjacent. 
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8. (Original) The method of claim 6, wherein said normalization factor is calculated as: 

NF =1/(2 Xi 2 ) I/2 , where is the number of occurrences of a specific term in said 
document, so that NF represents the reciprocal of the square root of the sum of squares of all 
term occurrences in said document. 

9. (Previously presented) An apparatus for organizing and representing in a computer 
memory a document corpus containing an ordered plurality of documents, said apparatus 
comprising: 

an integer determining module receiving in sequence each said ordered document of 
said document corpus and developing a first uninterrupted listing of unique integers to 
correspond to an occurrence of terms in the document corpus. 

10. (Currently amended) The apparatus of claim 9 23, further comprising: 

a normalizer developing a third uninterrupted listing for said entire document corpus, 
containing a sequential listing of floating point multipliers, each said floating point multiplier 
representing a document normalization factor for a corresponding document in said 
document corpus. 

11. (Original) The apparatus of claim 9, further comprising: 

a rearranger rearranging said unique integers so that any identical integers for each 
said document in said document corpus are adjacent. 
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12. (Original) The apparatus of claim 10, wherein said normalizer calculates said 
normalization factor as: 

NF - 1/ (2 Xi 2 ) 1/2 , where x,- is the number of occurrences of a specific term in said 
document, so that NF represents the reciprocal of the square root of the sum of squares of all 
term occurrences in said document. 

13. (Previously presented) A signal-bearing medium tangibly embodying a program of 
machine-readable instructions executable by a digital processing apparatus to perform a 
method to organize and represent in a computer memory a document corpus containing an 
ordered plurality of documents, said method comprising: 

developing a first uninterrupted listing of unique integers to correspond to the 
occurrence of terms in the document corpus. 

14. (Previously presented) The signal-bearing medium of claim 25, wherein said method 
further comprises: 

developing a third uninterrupted listing for said entire document corpus, containing a 
sequential listing of floating point multipliers, each said floating point multiplier representing 
a document normalization factor for a corresponding document in said document corpus. 

1 5 . (Previously presented) A data converter for organizing and representing in a computer 
memory a document corpus containing an ordered plurality of documents, for use by a data 
mining applications program requiring occurrence-of-terms data, said representation to be 

5 



PAGE 6/14 * RCVD AT 8/28/2005 3:52:47 PM [Eastern Daylight Time]* SVR:USPTO-EFXRF-6/38 ' DM S: 2738300 * CSID: 703761 2375 * DURATION (mm-ss): 04-34 



08/29/2005 15:51 FAX 7037612375 McGinn&Gibb , PLLC -> USPTO 81007 

S/N 09/848,430 

Docket: ARC920000023US1 

based on terms in a dictionary developed for said document corpus and wherein each, said 
term in said dictionary has associated therewith a corresponding unique integer, said data 
converter comprising: 

means for developing a first uninterrupted listing of said unique integers to 
correspond to the occurrence of dictionary terms in the document corpus and; and 

means for developing a second uninterrupted listing for said entire document corpus 
containing in sequence the location of each corresponding document in said first 
uninterrupted listing, wherein said first listing and said second listing are provided as input 
data for said data mining applications program. 

16. (Original) The data converter of claim 15, further comprising: 

means for developing a third uninterrupted listing for said entire document corpus, 
containing a sequential listing of floating point multipliers, each said floating point multiplier 
representing a document normalization factor for a corresponding document in said 
document corpus. 

17. (Original) The data converter of claim 15, further comprising: 

means for rearranging said unique integers so that any identical integers for each said 
document in said document corpus are adjacent. 

18. (Previously presented) The method of claim 1, further comprising: 

developing a dictionary comprising said terms contained in said document corpus; 

and 
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associating, with each said dictionary term, an integer to be uniquely corresponding to 
said dictionary term, said uniquely corresponding integers being said integers comprising said 
first vector. 

19. (Canceled) 

20. (Previously presented) The method of claim 5, further comprising: 

developing a dictionary comprising terms contained in said document corpus; and 
associating, with each said dictionary term, an integer to be uniquely corresponding to 

said dictionary term, said uniquely corresponding integers used in said first uninterrupted 

listing. 

21 . (Previously presented) The method of claim 5, further comprising: 

developing a second uninterrupted listing for said entire document corpus, said 
second uninterrupted listing containing, in sequence, the location of each corresponding 
document in said first uninterrupted listing. 

22. (Previously presented) The apparatus of claim 9, further comprising: 

a dictionary developing module to develop a dictionary of terms contained in said 
document corpus, each said term being associated with a corresponding unique integer. 
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23. (Previously presented) The apparatus of claim 9, further comprising: 

a locator module developing a second uninterrupted listing for said entire document 
corpus, said second uninterrupted listing containing, in sequence, the location of each 
corresponding document in said first uninterrupted listing. 

24. (Previously presented) The signal-bearing medium of claim 13, wherein said method 
further comprises: 

developing a dictionary comprising terms contained in said document corpus; and 
associating, with each said dictionary term, an integer to be uniquely corresponding to 

said dictionary term, said uniquely corresponding integers used in said first uninterrupted 

listing. 

25. (Previously presented) The signal-bearing medium of claim 13, wherein said method 
further comprises: 

developing a second uninterrupted listing for said entire document corpus, said 
second uninterrupted listing containing, in sequence, the location of each corresponding 
document in said first uninterrupted listing. 
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